Speech translation apparatus, method and program

ABSTRACT

According to one embodiment, a speech translation apparatus includes a speech recognition unit, a translation unit, a search unit and a selection unit. The speech recognition unit successively performs speech recognition to obtain a first language word string. The translation unit translates the first language word string into a second language word string. The search unit search for at least one similar example and acquires the similar example and a translation example. The selection unit selects, in accordance with a user instruction, at least one of the first language word string associated with the similar example and the second language word string associated with the translation example, as a selected word string.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims the benefit of priority fromJapanese Patent Application No. 2012-146880, filed Jun. 29, 2012, theentire contents of which are incorporated herein by reference.

FIELD

Embodiments described herein relate generally to a speech translationapparatus, method and program.

BACKGROUND

With the trend of globalization these days, there is an increasing needfor a speech translation device that supports communications betweenusers who speak different languages. In fact, some services of providingspeech translation functions have been operated. However, it isdifficult to provide speech recognition or machine translation withouterrors. There is a method for prompting a speaker of a target languageto point out incomprehensible translation so that a speaker of a sourcelanguage can correct the translation or modify what they have said forbetter understanding.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a speech translation apparatus according tothe first embodiment.

FIG. 2 is an example table of source language examples and targetlanguage examples stored in an example storage.

FIG. 3 is a flowchart of the operation of a speech translationapparatus.

FIG. 4 is a flowchart of the example search process.

FIG. 5 is a flowchart of the process of presenting similar examples andtranslation examples.

FIG. 6 illustrates an example of implementation of the speechtranslation apparatus.

FIG. 7 illustrates an example of screen of a touchscreen.

FIG. 8 illustrates the first process of the operation of the speechtranslation apparatus.

FIG. 9 illustrates the second process of the operation.

FIG. 10 illustrates the third process of the operation.

FIG. 11 illustrates the fourth process of the operation.

FIG. 12 illustrates the fifth process of the operation.

FIG. 13 illustrates the sixth process of the operation.

FIG. 14 illustrates the second process of the operation.

FIG. 15 illustrates the first process of the operation when a user on asource language side selects an example.

FIG. 16 illustrates the second process of the operation when a user on asource language side selects an example.

FIG. 17 illustrates an example of screen when no suitable examples areavailable.

FIG. 18 illustrates an example table stored in the example storageaccording to the second embodiment.

FIG. 19 illustrates an example of the operation of the speechtranslation apparatus according to the second embodiment.

FIGS. 20A and 20B are diagrams of a speech recognition system includingthe speech translation apparatus according to the third embodiment.

DETAILED DESCRIPTION

Recently, speech translation application software that operates on adevice such as a smartphone (a multifunction mobile terminal) has beencommercialized. Services that provide the functionality of speechtranslation have also been implemented. With these application softwareand services, relatively short conversational phrases in language A, forexample, one or a few sentences, are converted into a string of words byspeech recognition, and the string is translated by a machinetranslation module to language B, and the translation is output asspeech in language B by a speech synthesis module. To use theseapplication software and services, a user who speaks language A isrequired to speak in short phrases or sentences, at the same time, auser who speaks language B is required to check the translation andlisten to the output of speech synthesis, and so on.

Accordingly, conversation between users using conventional speechtranslation application software often involves waiting, and thus,supporting smooth and responsive conversation has been a challenge toconventional translation application software. It is desirable to removethe restriction that requires users to speak in short units, forexample; however, no such a function has been provided.

Generally, a user who speaks language A (a source language) needs tocorrect incomprehensible parts in a string of words which is a result ofspeech recognition. Moreover, a user who receives the translation inlanguage B (a target language) needs to provide feedback on thetranslation, sentence by sentence. Thus, it is difficult to achieveconversation with good responsiveness.

In general, according to one embodiment, a speech translation apparatusincludes an acquisition unit, a speech recognition unit, a translationunit, a search unit, a selection unit and a presentation unit. Theacquisition unit is configured to acquire speech in a first language asa speech signal. The speech recognition unit is configured tosuccessively perform speech recognition on the speech signal to obtain afirst language word string which is a result of the speech recognition.The translation unit is configured to translate the first language wordstring into a second language to obtain a second language word stringwhich is a result of translation. The search unit is configured tosearch for at least one similar example for each first language wordstring, and, if there is the similar example, to acquire the similarexample and a translation example which is a result of the translationof the similar example in the second language, the similar exampleindicating a word string that is similar to the first language wordstring in the first language. The selection unit is configured toselect, in accordance with a user instruction, at least one of the firstlanguage word string associated with the similar example and the secondlanguage word string associated with the translation example, as aselected word string. The presentation unit is configured to present oneor more similar examples and one or more translation examples associatedwith the selected word string.

In the following, the speech translation apparatus, method and programaccording to the present embodiment with reference to the drawings. Inthe following description, repetitive descriptions of the sameconstituent elements are avoided for brevity. In the description of thepresent embodiment, the source language is Japanese and the targetlanguage is English; however, translation according to the presentembodiment can be carried out between any languages.

First Embodiment

A speech translation apparatus according to the first embodiment isexplained with reference to FIG. 1.

The speech translation apparatus 100 according to the first embodimentincludes a speech acquisition unit 101, a speech recognition unit 102, amachine translation unit 103, a display 104, an example storage 105, anexample search unit 106, a pointing instruction detection unit 107, aword string selection unit 108, and an example presentation unit 109.

The speech acquisition unit 101 acquires a user's speech in a sourcelanguage (may be referred to as a first language) as speech signals.

The speech recognition unit 102 receives the speech signals from thespeech acquisition unit 101, and performs speech recognition on thespeech signals to obtain a source language word string as a result ofthe speech recognition. The speech recognition unit 102 successivelycarries out speech recognition unit by unit, during the time when speechsignals are input from the speech acquisition unit 101, and every time asource language word string is obtained, it is passed to the next step.A unit for speech recognition process is determined by pauses,linguistic breaks, when a speech recognition candidate is determined,and certain time intervals. A user may be informed by an event that aresult of speech recognition can be obtained. As the speech recognitionprocess carried out herein is usual speech recognition, detailedexplanation of the speech recognition is omitted.

The machine translation unit 103 receives the source language wordstring from the speech recognition unit 102, and translates it in atarget language (may be referred to as a second language) to obtain atarget language word string as a result of the machine translation. Asthe machine translation process carried out herein is a usual machinetranslation, detailed explanation of the machine translation is omitted.

The display 104 is, for example, a liquid crystal display (LCD). Thedisplay 104 receives the source language word string from the speechrecognition unit 102 and the target language word string from themachine translation unit 103, and displays the source language wordstring and the target language word string. In addition, the display 104receives a similar example and a translation example from the examplepresentation unit 109 (described later), and displays those examples. Asimilar example is an example in a source language similar to a sourcelanguage word string. A translation example is a translation of asimilar example.

The example storage 105 stores examples in a source language andexamples in a target language, and the source language examples areassociated with the target language examples. The source languageexamples and the target language examples stored in the example storage105 will be exampled later with reference to FIG. 2.

The example search unit 106 receives the source language word stringfrom the speech recognition unit 102, and searches for an examplesimilar to the received source language word string from the sourcelanguage examples accumulated in the example storage 105.

The pointing instruction detection unit 107 acquires point informationcorresponding to a point indicated by a user on the display 104.

The word string selection unit 108 receives point information from thepointing instruction detection unit 107, and selects a pointed portionin the source language word string or the target language word string asa selected word string.

The example presentation unit 109 receives the selected word string formthe word string selection unit 108, receives a similar example and atranslation example related to the selected word string from the examplesearch unit 106, respectively. The example presentation unit 109displays the similar example and the translation example on the display104. The example presentation unit 109 displays the selected wordstring, the selected similar example and the translation example withemphasis.

Next, an example of source language examples and target-languageexamples stored in the example storage 105 is described with referenceto FIG. 2.

As shown in FIG. 2, a source language example 201 and a target languageexample 202 associated with the source language example 201 are stored.For example, a source language word string “

(amari aruke nai)” and a translation “I can't walk such a long distance”are stored in the example storage 105.

The operation of the speech translation apparatus 100 according to thepresent embodiment is explained with reference to the flowchart in FIG.3. Although not shown in the flowchart, the speech recognition unit 102and the machine translation unit 103 are operated in parallel;accordingly, the speech recognition unit 102 and the machine translationunit 103 need to be activated in prior to the process shown in FIG. 3.

In step S301, the speech recognition unit 102 performs speechrecognition to obtain a source language word string.

In step S302, the display 104 displays the source language word string.

In step S303, the machine translation unit 103 performs machinetranslation to obtain a target language word string.

In step S304, the display 104 displays the target language word string.It is possible not to show the source language word string at step S302.Instead, the source language word string may be displayed with thetarget language word string only after the target language word stringis obtained.

In step S305, the examples search unit 106 carries out example searchprocess. The process will be explained later with reference to theflowchart in FIG. 4.

In step S306, the pointing instruction detection unit 107 detectswhether there is an instruction from a user, i.e., a pointing to atarget language word string whose meaning is unclear, or not. In a casewhere the display 104 is a touchscreen, the instruction from a user isdetected if a user touches a sign that indicates similar examples andtranslation examples are available. If a user indication is detected,the process proceeds to step S307; if no user indication is detected,the step returns to step S301, and the same process is repeated.

In step S307, the speech recognition unit 102 stops the speechrecognition temporarily.

In step S308, the example presentation unit 109 presents examples. Theprocess of presenting examples will be explained later with reference tothe flowchart in FIG. 5.

In step S309, the speech recognition unit 102 resumes the speechrecognition, and repeats the process from step S301. After this stage,when there is no more input of speech, or when a user instructs to stopthe speech recognition, the operation of the speech translationapparatus is stopped.

Next, the details of the operation at step S305 are explained withrespect to the flowchart shown in FIG. 4.

In step S401, the example search unit 106 receives the source languageword string.

In step S402, the example search unit 106 searches the examples storedin the example storage 105 for an example similar to the obtained sourcelanguage word string. To search for a similar example, an edit distancebetween a source language word string and a source language example iscalculated, and if an edit distance is not less than a threshold, anexample can be determined to be similar to the source language wordstring. It is also possible to determine an example is similar to thesource language word string if the number of morphological matchesbetween the example and the source language word string is not less thana threshold. If there is a similar example, the process proceeds to stepS403. If there is no similar example, the process at steps S305 and S306is completed.

In step S403, the example presentation unit 109 puts an icon on thedisplay 104 to indicate which source language word string has a similarexample and which target language word string associated with the sourcelanguage word string has a translation example.

Next, the process of presenting similar examples and translationexamples at step S308 is explained with reference to the flowchart inFIG. 5. Hereinafter, both similar examples and translation examples willbe referred to as examples, unless specified.

In step S501, the example presentation unit 109 displays an example withnotification. The notification is a message to indicate that a userwants to check meaning. One example or a list of examples can bedisplayed. In the list, it is possible to display, for example, the topfive examples with high similarity to the result of speech recognition,all available examples, or examples selected in accordance with thehistory of the examples presented in the past.

In step S502, the pointing instruction detection unit 107 detectswhether or not an example on the list is pointed out; in other words,the unit detects whether or not a user selected an example. If anexample is selected, the process proceeds to step S503; if no example isselected, the process proceeds to step S504.

In step S503, the example presentation unit 109 displays the selectedexample with emphasis. More specifically, as a result of pointing out atranslation example, a color of the selected translation example isreversed or highlighted, for example. When a translation example isdisplayed with emphasis, the corresponding similar example is alsodisplayed with emphasis, or vice versa.

In step S504, the example presentation unit 109 presents a confirmationmessage (or a notification). The confirmation message is a message thatrequests a user to determine whether or not the selected example isappropriate.

In step S505, the pointing instruction detection unit 107 detectswhether or not deletion is instructed. An instruction of deletion isdetected when a deletion is instructed while a deletion button isselected, for example. If deletion is instructed, the process proceedsto step S506; if no deletion is instructed, the process returns to stepS502, and the same process is repeated.

In step S506, the example presentation unit 109 determines that there isno appropriate example in the presented examples, and the display 104displays a confirmation message that the translation is not understoodby your conversation partner.

In step S507, the pointing instruction detection unit 107 detectswhether or not there is a pointing by a user as a response to theconfirmation message. If there is a confirmation message, the processproceeds to step S508, and if there is no confirmation message, thepointing instruction detection unit 107 waits until there is a pointingfrom a user.

In step S508, the pointing instruction detection unit 107 detectswhether or not the pointing from the user indicates confirm. If thepointing does not indicate confirm, the process proceeds to step S509;if the pointing indicates confirm, the process proceeds to step S510.

In step S509, the example presentation unit 109 hides the confirmationmessage, and the emphasis put on the selected example. Then the processreturns to step S502, and repeats the same process.

In step S510, the example presentation unit 109 adds the selectedexample to a suitable area on the display, and presents the selectedexample.

In step S511, the example presentation unit 109 deletes the sourcelanguage word string and the target language word string which aretargets of the process.

In step S512, the example presentation unit 109 hides the list ofexamples displayed at step S501. Thus, the process of presentingexamples is finished.

Next, an example of implementation of the speech translation apparatusis explained with reference to FIG. 6.

FIG. 6 shows an example in which the speech translation apparatus 100according to the present embodiment is implemented on tablet-typehardware. The speech translation apparatus 600 shown in FIG. 6 includesa body 601, a touchpanel display 602 and a microphone 603.

The touchpanel display 602 and the microphone 603 are implemented on thebody 601.

The touchscreen display 602 has a pointing function (a pointinginstruction detection unit) for detecting a contact with a user'sfingertip on the screen as a pointing if the screen is an electrostaticcapacitance touchscreen, and a display function (a display) fordisplaying texts and images.

As a general microphone can be used for the microphone 603, explanationof the microphone is omitted.

Next, an example of screen display on the touchscreen display 602 isexplained with reference to FIG. 7.

As shown in FIG. 7, as an example of layout for a screen display, adisplay area 701 on which source language word strings are shown on theleft half of the screen and a display area 702 on which target languageword strings on the right half of the screen are displayed. The farright of the screen is a column of a speech start button 703, a languageswitch button 704, a delete button 705, and an end button 706.

The speech start button 703 is pointed by a user to instruct to startspeech. The language switch button 704 is pointed by a user to switchbetween a source language and a target language. The delete button 705is pointed when deleting examples, etc. The end button 706 is pointed toend speech recognition.

The layout of the buttons is not limited to the layout shown in FIG. 7.For example, a group of buttons can be popped-up as needed by the user.The display is not limited to a touchscreen display. For example, acombination of a screen and a keyboard can be adopted.

Next, a specific example of the operation of the speech translationapparatus according to the present embodiment is explained withreference to FIGS. 8 to 14. Here, the operation example using the speechtranslation apparatus 600 shown in FIG. 6 is explained.

FIG. 8 shows an example of a display when a user speaks in the targetlanguage. The example of FIG. 8 shows a machine translation of speech ina target language to a source language. To achieve the machinetranslation in this example, Japanese as a source language and Englishas a target language are switched to perform the process same as theabove-described process. More specifically, when a user utters a speechsound 801, “Have you already been around here?” a speech recognitionresult 802-E, “Have you already been around here?” is displayed on thedisplay area 702, and a machine translation result 802-J “

? (kono atari ha mou mawa rare masita ka?)” which is a Japanesetranslation of the speech recognition result 802-E is displayed on thedisplay area 701.

FIG. 9 shows an example of a display when a user speaks in the sourcelanguage. The speech acquisition unit 101 acquires a speech sound 901 “

(mite mawari tain da kedo, amari ha aruki taku nainde, basu tuaa toka gaii naa),” and the display area 701 displays a source language wordstring 902-J “

(mite mawari tai),” 903-J “

(amari ha aruki taku nai),” 904-J “

(basu tuaa toka ga ii)” subsequently as a result of speech recognition.In addition, the display area 702 displays the machine translationresults corresponding to the speech recognition results, i.e., a targetlanguage word string 902-E “I would like to look around,” 903-E “Amaridoesn't want to walk,” 904-E “a bus tour is good.” An icon 905 indicatesthat a similar example and a translation example are available. In thisexample, the target language word string 903-E does not make sensebecause of the error caused by the machine translation.

FIG. 10 shows an example where the user on the target language sidepoints out the target language word string 903-E that does not makesense. Pointing can be a touch on the icon 905, or a cursor on the icon905. When the icon 905 is pointed, a message 1002-E and a correspondingmessage 1002-J are shown on the display. In the example shown in FIG.10, the message 1002-J “

? (nan to osshari tai no desyo u ka)(What would you like to say?)” isshown on the display area 701, and the message 1002-E “Can youunderstand what your partner wants to say” is shown on the display area702.

In FIG. 11, as a result of selecting a target language word string by auser, a similar example of the source language word string is shown onthe display area 701, and a corresponding translation example of thetarget language word string is shown on the display area 702. Forexample, similar examples 1101-J “

(amari aruke nai)”, 1102-J “

(watasi ha amari aruki taku nai)” and 1103-J “

(asita ha aruki tai)”, and a translation example 11101-E (correspondingto 1101-J) “I can't walk such a long distance,” 1102-E (corresponding to1102-J) “I don't want to walk,” and 1103-E (corresponding to 1103-J)“I'd like to walk tomorrow” are displayed.

FIG. 12 shows an example in which the user on the target language sideselects a translation example. In this example, the translation example1201-E “I can't walk for a long distance” is selected, and the selectedtranslation example and its corresponding similar example 1201-J arehighlighted. When the translation example is selected, a message 1202 “

? (osshari tai koto ha kono naiyou de yorosii desu ka) (Is this what youwould like to say?)” is displayed on the display area 701 on the sourcelanguage area. If more than one similar examples and translationexamples are shown, the list of similar examples and translationexamples can be scrolled using a scroll bar 1104.

In FIG. 13, the user on the target language side points at anincomprehensive target language word string to answer whether or not theuser accepts the highlighted similar example. More specifically, in theexample shown in FIG. 13, the user touches “

(hai) (YES)” or “

(iie) (NO)” in the message 1202 on the display, or selects by moving thecursor 1001. Then, the pointing instruction detection unit 107 detectswhich of “YES” or “NO” the user selected.

In FIG. 14, when the user on the source language side selects “

(YES),” the list of similar examples and translation examples is hidden,and a selected similar example and its corresponding translation exampleare added to display on the display areas 701 and 702, and the originalsource language word string and target language word string which aretranslation errors are deleted. For example, a strikethrough is put overthe source language word string 1401-J “

(amari aruki taku nai),” and a similar example

(amari aruke nai)” is displayed above it. On the other hand, astrikethrough is put over the target language word string 1401-E “Amaridoesn't want to walk,” and a translation example “I can't walk such along distance” is displayed above it. Thus, even when the targetlanguage user does not understand the translation result, if the targetlanguage user can select an example, a corresponding example is shown tothe source language user. What the source language user needs to do isto determine whether or not the selected similar example is appropriate.Therefore, the ability of paraphrasing on the source language user sideis not required to carry out smooth conversations between the users.

In the above example, the target language user selects a translationexample; however, the source language user may select a similar example.The examples in which the source language user selects a similar examplewill be explained below with reference to FIGS. 15 and 16.

As shown in FIG. 15, the source language user selects a similar example.In this example, the similar example 1501-J “

(watashi wa amari aruki taku nai)” is selected and highlighted. If thesimilar example 1501-J is selected, the translation example 1501-E “Idon't want to walk” displayed in the display area 702 on the targetlanguage side is highlighted. At the same time, a confirmation message1502 “Can you see what your partner wants to say” is displayed on thedisplay area 702.

In FIG. 16, the target language user points out whether or not the useraccepts the highlighted translation example with the cursor 1001, etc.Thus, if there is a sentence having similar examples in a sourcelanguage word string, the source language user can select a similarexample by themselves to paraphrase what they have said.

Next, an example where there are no appropriate similar and translationexamples is explained with reference to FIG. 17.

When the target language user or the source language user determinesthat there is no appropriate example and does not select any example, noexample is inserted to a source language word string or target languageword string to be processed. Further, the source language word string ortarget language word string to be processed is deleted, and aconfirmation message 1701, such as “

(mousiwake ari masen ga, tutawara nakatta you desu) (Unfortunately, yourpartner could not understand what you said),” is displayed.

In this case, although the content of the target language word stringdid not get across to the target language user, at least the sourcelanguage user can know the machine translation of what they said was notunderstood by the target language user. Thus, it is possible for thesource language user to rephrase what he wants to say.

According to the first embodiment as described above, search for similarexamples is conducted for a source language word string, and if there issimilar example and a user selects the similar example, the similarexample and corresponding translation example are displayed. Thus, theusers can cooperate to select examples for incomprehensible parts in asource language word string in a speech recognition result and a targetlanguage word string in a machine translation result, so that they canunderstand the incomprehensible parts and have a smooth conversationbetween different languages. Further, it is possible to stop speechrecognition when a translation example is selected and to display theexamples, thereby achieving responsive conversation between users.

Second Embodiment

The second embodiment is different from the first embodiment in terms ofhow the source language examples or the target language examples arestored in the example storage 105. In the second embodiment, a sourcelanguage example or target language example are associated withannotations when being stored. When translating from a source languageto a target language, sometimes the meaning in the source language isunclear. For example, it can be unclear whether “

(kekkou desu)” in Japanese means to decline something or accept it.Similarly, it can be unclear whether “you are welcome” in English meansa greeting or a response to thanks.

Thus, the second embodiment provides a way to show users an example towhich the intention of a source language user and the intention of atarget language user are correctly reflected by annotating a sourcelanguage word string or a target language word string.

The speech translation apparatus according to the second embodiment isthe same as the speech translation apparatus 100 according to the firstembodiment, except for examples stored in the example storage 105 andthe operation at the example search unit 106.

The example storage 105 associates a source language example with anannotation, and associates a target language example with an annotationto store the examples.

The example search unit 106 searches if there is any annotation for asimilar example, when any similar example is available for the targetlanguage word string.

Next, an example of a table stored in the example storage 105 accordingto the second embodiment is explained with reference to FIG. 18.

As shown in FIG. 18, a source language example 1801 is associated withan annotation 1802 and stored, and a target language example 1803 isassociated with an annotation 1804 and stored. For example, a sourcelanguage example 1805-J “

(kekkou desu)” is associated with an annotation 1805-1 “

(daijobu desu),” and a source language example 1806-J “

(kekkou desu)” is associated with an annotation 1806-1

(huyou desu),” and they are stored. Thus, a source language examplehaving multiple meanings is associated with annotations corresponding toeach of the meanings.

Herein, for a target language example which is a translation of a sourcelanguage example with an annotation, a target language translation of asource language example based on the annotation, not a mere translationof a source language example, is stored. For example, a target languageexample 1805-E “that's good” is stored as a translation of the sourcelanguage example 1805-J (“

(kekkou desu)”) in accordance with the annotation 1805-1 (“

(daijobu desu)”). For another example, a target language example 1806-E“no, thank you” is stored as a translation of the source languageexample 1806-J (“

(kekkou desu)”) in accordance with the annotation 1806-1 (“

(fuyou desu)”).

If an annotation is available for a target language example, the targetlanguage example 1807-E “You're welcome” is associated with theannotation 1807-1 “Welcome to you,” and the target language example1808-E “You're welcome” is associated with the annotation 1808-1 “Don'tmention it.” Herein, a source language corresponding to a targetlanguage example having these annotations, similarly to the case of asource language example having annotations, a source languagecorresponding to annotations is stored. For example, a translation ofthe annotation 1807-1 “welcome to you” in the source language, i.e., asource language example 1807-J “

(irrashai mase),” is associated with the target language example 1807-E“You're welcome” and the annotation 1807-1 “welcome to you,” and isstored.

Similarly, a translation of the annotation 1808-1 “Welcome to you” inthe source language, i.e., a source language example 1808-E “

(tondemo ari masen),” is associate with the target language example1808-E “You're welcome” and the annotation 1807-1 “Welcome to you,” andis stored. Thus, if different annotations are available for the samesource language example, a translation in accordance with eachannotation is stored as a target language example. Conversely, ifdifferent annotations are available for the same target languageexample, a translation in accordance with each annotation is stored as asource language example.

Next, a specific example of the operation of the speech translationapparatus according to the second embodiment is explained with referenceto FIG. 19.

FIG. 19 is similar to the example shown in FIG. 11; however, in theexample of FIG. 19, annotations are shown in addition to the similarexamples when displaying a list of examples. For example, “

(kekkou desu)” (“daijobu desu”) and

(kekkou desu)” (“fuyou desu”) are shown in a list of similar examples.It is preferable that an icon 1901 when an annotation is available for asimilar example is distinguishable from an icon to indicate noannotations are available for a similar example. For example, if noannotations available, the icon may be in white on a dark background,and if annotations available, the icon may be in black, so that a usercan know that the meaning of a sentence is unclear but an annotation forthe sentence is available.

In the example shown in FIG. 19, two similar examples, 1902-J “

(kekkou desu)” (“

(daijobu desu)”) and 1903-J “

(kekkou desu)” (“

(fuyou desu)”) are displayed; in other words, corresponding threetranslations 1902-E1 “That's fine” and 1902-E2 “All right,” and 1903-E“No, thank you” are displayed. If a similar example and an annotationare the same, it is displayed when a user selects a similar examplecorresponding to a translation example.

According to the second embodiment described above, if an annotation isassociated with an example, both examples and annotations are displayed,so that both a target language user and a source language user can seethe annotations and select appropriate examples for a vague example.

Third Embodiment

It is assumed that the above-described first and second embodiments areimplemented in a single device. However, the process may be divided tobe performed by multiple devices. In the third embodiment, it is assumedthat the process is realized by a cooperation of a server and a client.

Generally, when speech translation is performed by a device with limitedcalculation resource and storage resource on a client, such as a mobilephone and a tablet computer, etc., data amount and search space arelimited. Accordingly, by performing speech recognition, machinetranslation and example search, which impose a heavy processing load, ona server for which calculation resources and storage resources can beeasily extended, the amount of processing on a client side can bereduced.

Herein, referring to the block diagram shown in FIG. 20, the speechrecognition system including a speech translation apparatus according tothe third embodiment is explained.

The speech recognition system shown in FIG. 20 includes a server 2000and a client 2500.

The server 2000 includes a speech recognition unit 2001, a machinetranslation unit 2002, an example search unit 2003, an example storage2004, a server communication unit 2005, and a server control unit 2006.

The explanation of the speech recognition unit 2001, the machinetranslation unit 2002, the example search unit 2003 and the examplestorage 2004 are omitted, as the operation of those units are similar tothe operation of the speech recognition unit 102, the machinetranslation unit 103, the example search unit 106 and the examplestorage 105 according to the first embodiment.

The server communication unit 2005 communicates data with a clientcommunication unit 2506 which will be described later.

The server control unit 2006 controls the entire operation at theserver.

The client 2500 includes a speech acquisition unit 2501, a display 2502,a pointing instruction detection unit 2503, a word string selection unit2504, an example presentation unit 2505, a client communication unit2506 and a client control unit 2507.

The explanation of the speech acquisition unit 2501, the display 2502,the pointing instruction detection unit 2503, the word string selectionunit 2504 and the example presentation unit 2505 is omitted as theoperation of those units is the same, as the operation of the speechacquisition unit 101, the display 104, the pointing instructiondetection unit 107, the word string selection unit 108 and the examplepresentation unit 109 according to the first embodiment.

The client communication unit 2506 communicates data with the servercommunication unit 2005.

The client control unit 2507 entirely controls the client 2500.

Next, an example of the speech translation performed by the server 2000and the client 2500 is explained.

At the client 2500, the speech acquisition unit 2501 acquires speechfrom a user, and the client communication unit 2506 transmits speechsignals to the server 2000.

At the server 2000, the server communication unit 2005 receives speechsignals from the client 2500, and the speech recognition unit 2001performs speech recognition on the received speech signals. Then, themachine translation unit 103 performs machine translation on the speechrecognition result. The server communication 2005 transmits the speechrecognition result and the machine translation result to the client2500. Further, the example search unit 2003 searches similar examplessimilar to the speech recognition result, and if a similar example isavailable, the similar example and a corresponding translation exampleare transmitted to the client 2500.

At the client 2500, the client communication unit 2506 receives thespeech recognition result and the machine translation result, and thesimilar example and the translation example corresponding to thoseresults, and the display 2502 displays the speech recognition result andthe machine translation result. If the pointing instruction detectionunit 2503 detects an instruction from a user, the example presentationunit 2505 presents a translation example and a similar example relatedto the selected word string.

There is a case where the client 2500 receives a predetermined number ofextracted similar examples and corresponding translation examples, notall of the similar examples, if any similar examples are available for aspeech recognition result. In this case, the client 2500 transmits arequest to the server 2000 to receive other similar examples that havenot yet received or translation examples corresponding to the similarexamples. The example search unit 2003 of the server 2000 extracts asimilar example that has not yet been extracted and a correspondingtranslation example, and the server communication unit 2005 transmitsthe similar example and the translation example. At the client 2500, theclient communication unit 2506 receives the similar example and thetranslation example, and displays a new similar example and atranslation example.

It is also possible that the server 2000 transmits only a flagindicating that a similar example is available to the client 2500. Atthe client 2500, when a pointing from a user is detected, a request fora similar example and a translation example related to the selected wordstring is sent to the server 2000, and the server 2000 transmits asimilar example and a translation example in accordance with a requestto the client 2500. Thanks to this configuration, search for examples isperformed only when needed, and thus, the speed of speech translationcan be improved on the client side.

According to the third embodiment described above, the speechrecognition, machine translation and example search, which impose aheavy processing load, are performed on a server for which calculationresources and storage resources can be easily extended; as a result, theprocessing load on the client can be reduced.

The flowcharts of the embodiments illustrate methods and systemsaccording to the embodiments. It will be understood that each block ofthe flowchart illustrations, and combinations of blocks in the flowchartillustrations, can be implemented by computer program instructions.These computer program instructions may be loaded into a computer orother programmable apparatus to produce a machine, such that theinstructions which execute on the computer or other programmableapparatus create means for implementing the functions specified in theflowchart block or blocks. These computer program instructions may alsobe stored in a computer-readable memory that can direct a computer orother programmable apparatus to function in a particular manner, suchthat the instruction stored in the computer-readable memory produce anarticle of manufacture including instruction means which implement thefunction specified in the flowchart block or blocks. The computerprogram instructions may also be loaded into a computer or otherprogrammable apparatus to cause a series of operational steps to beperformed on the computer or other programmable apparatus to produce acomputer programmable apparatus which provides steps for implementingthe functions specified in the flowchart block or blocks.

While certain embodiments have been described, these embodiments havebeen presented by way of example only, and are not intended to limit thescope of the inventions. Indeed, the novel embodiments described hereinmay be embodied in a variety of other forms; furthermore, variousomissions, substitutions and changes in the form of the embodimentsdescribed herein may be made without departing from the spirit of theinventions. The accompanying claims and their equivalents are intendedto cover such forms or modifications as would fall within the scope andspirit of the inventions.

1. (canceled)
 2. A translation apparatus comprising: a translation unitconfigured to translate a first language word string of a first languageinto a second language to obtain a second language word string which isa result of translation; a search unit configured to search for at leastone similar example for each first language word string, and, if thereis the similar example, to acquire the similar example and a translationexample which is a result of the translation of the similar example inthe second language, the similar example indicating a word string thatis similar to the first language word string in the first language; aselection unit configured to select, in accordance with a userinstruction, at least one of the first language word string associatedwith the similar example and the second language word string associatedwith the translation example, as a selected word string; and apresentation unit configured to present one or more similar examples andone or more translation examples associated with the selected wordstring.
 3. The apparatus according to claim 2, further comprising adisplay configured to display each of the first language word string andthe similar example, and the second language word string and thetranslation example, wherein the presentation unit causes the display todisplay a first icon indicating that there is an example associated withthe first language word string and corresponding second language wordstring, if the first language word string has a similar example.
 4. Theapparatus according to claim 2, wherein the presentation unit presents alist of a plurality of the similar examples and the translation examplesif the selected word string is selected.
 5. The apparatus according toclaim 2, wherein the presentation unit highlights both the similarexamples and the translation examples if the similar example or thetranslation example is selected, and further presents a firstnotification to prompt a user to determine whether or not thehighlighted similar example or the highlighted translation example isappropriate.
 6. The apparatus according to claim 2, further comprising astorage configured to store the similar example and the translationexample in association with each other.
 7. The apparatus according toclaim 6, wherein the storage stores the similar example, the translationexample, and an annotation for at least one of the similar example andthe translation example in association with each other.
 8. The apparatusaccording to claim 7, wherein if the first language word string has afirst similar example and there is the annotation associated with thefirst similar example, the presentation unit causes the display todisplay a second icon in association with the first language word stringand the corresponding second language word string to indicate that theannotation is available.
 9. The apparatus according to claim 3, whereinthe presentation unit causes the display to display a secondnotification to prompt a user to confirm the first language word stringif the second language word string is selected.
 10. A translationapparatus comprising: a display configured to display a first languageword string of a first language, and a second language word string whichis a translation of the first language word string; a detection unitconfigured to detect a location on the display indicated by a user; aselection unit configured to select, in accordance with the location, atleast one of the first language word string and the second language wordstring; and a presentation unit configured to present one or moresimilar examples which are examples in the first language and similar tothe first language word string and one or more translation exampleswhich are translations of the similar examples in the second language,wherein the display further displays the presented similar examples andtranslation examples.
 11. A translation method comprising: translating afirst language word string of a first language into a second language toobtain a second language word string which is a result of translation;searching for at least one similar example for each first language wordstring, and, if there is the similar example, acquiring the similarexample and a translation example which is a result of the translationof the similar example in the second language, the similar exampleindicating a word string that is similar to the first language wordstring in the first language; selecting, in accordance with a userinstruction, at least one of the first language word string associatedwith the similar example and the second language word string associatedwith the translation example, as a selected word string; and presentingone or more similar examples and one or more translation examplesassociated with the selected word string.
 12. The method according toclaim 11, further comprising displaying, at a display, each of the firstlanguage word string and the similar example, and the second languageword string and the translation example, wherein the presenting the oneor more similar examples causes the display to display a first iconindicating that there is an example associated with the first languageword string and corresponding second language word string, if the firstlanguage word string has a similar example.
 13. The method according toclaim 11, wherein the presenting the one or more similar examplespresents a list of a plurality of the similar examples and thetranslation examples if the selected word string is selected.
 14. Themethod according to claim 11, wherein the presenting the one or moresimilar examples highlights both the similar examples and thetranslation examples if the similar example or the translation exampleis selected, and further presents a first notification to prompt a userto determine whether or not the highlighted similar example or thehighlighted translation example is appropriate.
 15. The method accordingto claim 11, further comprising storing, in a storage, the similarexample and the translation example in association with each other. 16.The method according to claim 15, wherein the storing in the storagestores the similar example, the translation example, and an annotationfor at least one of the similar example and the translation example inassociation with each other.
 17. The method according to claim 16,wherein if the first language word string has a first similar exampleand there is the annotation associated with the first similar example,the presenting the one or more similar examples causes the display todisplay a second icon in association with the first language word stringand the corresponding second language word string to indicate that theannotation is available.
 18. The method according to claim 12, whereinthe presenting the one or more similar examples causes the display todisplay a second notification to prompt a user to confirm the firstlanguage word string if the second language word string is selected. 19.A non-transitory computer readable medium including computer executableinstructions, wherein the instructions, when executed by a processor,cause the processor to perform a method comprising: translating a firstlanguage word string of a first language into a second language toobtain a second language word string which is a result of translation;searching for at least one similar example for each first language wordstring, and, if there is the similar example, acquiring the similarexample and a translation example which is a result of the translationof the similar example in the second language, the similar exampleindicating a word string that is similar to the first language wordstring in the first language; selecting, in accordance with a userinstruction, at least one of the first language word string associatedwith the similar example and the second language word string associatedwith the translation example, as a selected word string; and presentingone or more similar examples and one or more translation examplesassociated with the selected word string.
 20. The medium according toclaim 19, further comprising displaying, at a display, each of the firstlanguage word string and the similar example, and the second languageword string and the translation example, wherein the presenting the oneor more similar examples causes the display to display a first iconindicating that there is an example associated with the first languageword string and corresponding second language word string, if the firstlanguage word string has a similar example.
 21. The medium according toclaim 19, wherein the presenting the one or more similar examplespresents a list of a plurality of the similar examples and thetranslation examples if the selected word string is selected.